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Playing  Detective:  Reconstructing  Software  Architecture 
from  Available  Evidence 

Abstract:  Because  a  system’s  software  architecture  strongly  influences  its 
ability  to  support  quality  attributes  such  as  modifiability,  performance,  and 
security,  it  is  important  to  be  able  to  analyze  and  reason  about  that  architecture. 
However,  architectural  documentation  frequently  does  not  exist,  and  when  it 
does,  it  is  often  out  of  sync  with  the  implemented  system.  In  addition,  it  is  rare 
that  software  development  begins  with  a  clean  slate;  systems  are  almost 
always  constrained  by  existing  legacy  code.  As  a  consequence,  we  need  to  be 
able  to  extract  information  from  existing  system  implementations  and  reason 
architecturally  about  this  information.  This  paper  presents  Dali,  an  open, 
lightweight  workbench  that  aids  an  analyst  in  extracting,  manipulating,  and 
interpreting  architectural  information.  By  assisting  in  the  reconstruction  of 
architectures  from  extracted  information,  Dali  helps  an  analyst  redocurnent 
architectures  and  discover  the  relationship  between  “as-implemented  and  as- 
designed”  architectures. 


1  Software  Architecture  as  Shared  Hallucination 

The  formal  study  of  software  architecture  has  been  a  significant  addition  to  the  software  engi¬ 
neering  repertoire  in  the  1990s.  It  has  promised  much  to  designers  and  developers:  help  with 
the  high-level  design  of  complex  systems;  early  analysis  of  high-level  designs,  particularly  with 
respect  to  their  satisfaction  of  quality  attributes  such  as  modifiability,  security,  and  perfor¬ 
mance;  higher  level  reuse  such  as  that  of  designs;  and  enhanced  stakeholder  communication 
[Garlan  93].''  These  benefits  seem  enticing.  However,  much  of  the  promise  of  software  archi¬ 
tecture  has  as  yet  gone  unfulfilled.  Why  is  this? 

Some  of  the  problems  simply  stem  from  the  fact  that  architectures  are  seldom  documented 
properly: 

•  Many  systems  have  no  documented  architecture  at  all.  {All systems  have  an  architecture, 
but  frequently  it  is  not  explicitly  known  or  recorded  by  the  developers  and  therefore 
evolves  in  an  ad  hoc  fashion.) 

•  Architectures  are  represented  in  such  a  way  that  the  relationship  between  the 
architectural  representation  and  the  actual  system,  particularly  its  source  code,  is  unclear. 

•  In  systems  that  do  have  properly  documented  architectures,  the  architectural 
representations  are  frequently  out  of  sync  with  the  actual  system,  because  maintenance 
of  the  system  occurs  without  a  similar  effort  to  maintain  the  architectural  representation. 

We  see  these  problems  on  a  regular  basis  at  the  Software  Engineering  Institute  (SEI)  when 
we  do  architectural  evaluations.  There  is  little  completely  new  development.  Development  is 


^  See  also  Bass,  Len;  Clements,  Paul;  &  Kazman,  Rick.  Software  Architecture  in  Practice.  Reading,  MA:  Addi- 
son-Wesley,  1997  (this  document  is  currently  in  press) 
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typically  constrained  by  compatibility  with,  or  use  of,  legacy  systems.  And  it  is  rare  that  such 
systems  have  an  accurately  documented  architecture.  Because  of  these  issues,  we  have  a 
serious  problem  in  assessing  architectural  conformance,  and  if  we  cannot  assess  architectural 
conformance — that  is,  if  we  cannot  ensure  that  our  architectural  documentation  matches  what 
we  have  implemented — what  good  is  having  an  architecture?  If  we  can  not  confidently  estab¬ 
lish  the  relationship  between  the  documented  and  implemented  architectures,  much  of  the  val¬ 
ue  of  having  an  architecture  is  lost. 

In  addition,  when  a  system  enters  the  maintenance  portion  of  its  life  cycle,  it  may  sustain  mod¬ 
ifications  that  alter  its  architecture.  Hence,  a  second  problem  arises:  How  do  we  know  that 
maintenance  operations  are  not  eroding  the  architecture,  breaking  down  abstractions,  bridg¬ 
ing  layers,  compromising  information  hiding,  and  so  forth? 

All  of  these  are  manifestations  of  two  underlying  causes.  The  first  is  that  a  system  does  not 
have  “an  architecture.”  It  has  many:  its  runtime  relationships,  data  flows,  control  flows,  code 
structure,  and  so  on.  The  second,  more  serious,  cause  is  that  the  architecture  that  is  repre¬ 
sented  in  a  system’s  documentation  may  not  coincide  with  any  of  these  views.  “The  architec¬ 
ture”  is  frequently  some  abstracted  runtime  view  of  the  system.  For  example,  even  though  a 
system  is  described  as  “layered,”  the  location  and  boundaries  of  the  layers  are  not  obvious 
from  an  examination  of  any  of  the  architectural  views. 

Quite  simply,  there  is  no  accepted  way  of  enforcing  a  “layer”  in  a  system’s  implementa¬ 
tion — there  is  no  explicit  “layer”  construct  in  any  modern  programming  language.  Layering  may 
be  realized  by  programming  language  constructs  such  as  modules  or  inheritance  with  acces¬ 
sibility  constraints.  While  these  support  layering,  they  in  no  way  enforce  it.  Attempts  to  enforce 
layering  are  typically  made  through  other  means  such  as  naming  conventions  (e.g.,  all  func¬ 
tions  defined  by  the  X  windows  library,  Xlib,  begin  with  the  letter  “X”),  code  ownership  (the 
graph  layout  layer  is  owned  by  a  single  development  group,  and  only  they  can  change  it),  and 
design  conventions  (the  graph  layout  layer  cannot  directly  call  the  window  system,  but  must 
instead  call  a  virtual  toolkit  layer).  For  example,  the  “operating  system”  layer  seen  in  many  ar¬ 
chitectural  diagrams  is  only  a  layer  by  virtue  of  code  ownership:  few  designers  or  developers 
have  access  to  its  source.  So,  designers  must  treat  it  as  a  sealed  layer.  This  is  the  best  case. 
In  many  cases  we  have  access  to  all  of  the  source  code  in  our  architecture.  The  architecture 
as  a  whole  does  not  exist  in  any  artifact  that  we  actually  implement. 

So,  is  software  architecture  a  shared  hallucination  that  we,  as  developers,  gladly  and  glibly  as¬ 
cribe  to?  If  not,  how  do  we  know  what  the  architecture  of  a  system  is?  How  do  we  validate  this? 
How  do  we  measure  the  conformance  of  an  “as-implemented”  architecture  against  its  “as-de¬ 
signed”  architecture?  If  we  can’t  answer  these  questions  then  our  use  of  software  architecture 
amounts  to  little  more  than  a  vague  vision  and  blind  faith  in  the  abilities  of  the  original  design¬ 
ers  and  their  successors. 

In  this  paper  we  present  a  prototype  system,  Dali,  for  helping  a  user  reason  about  an  imple¬ 
mented  architecture.  Dali  is  an  interactive  system  that  aids  the  user  in  /nferpref/nfir  architectural 
information  that  has  been  extracted  automatically.  The  system  does  not  attempt  to  do  it  all. 


2 


CMU/SEI-97-TR-010 


which  is  to  say  that  it  does  not  attempt  to  automatically  “find”  the  architecture  for  the  user.  This 
approach  has  been  tried  before  and  has  failed.  Rather,  Dali  supports  the  user  in  defining  ar¬ 
chitectural  patterns  and  in  matching  those  patterns  to  extracted  information. 

There  are  three  techniques  used  in  reconstructing  architectural  views  using  Dali: 

1 .  Architectural  extraction,  which  captures  the  as-implemented  architecture  from  source  ar¬ 
tifacts  such  as  code  and  makefiles.  We  can  augment  this  static  information  with  output 
from  analysis  tools  that  capture  a  system’s  dynamic  behavior  (such  as  profilers  or  test 
coverage  tools). 

2.  User-defined  architectural  patterns  that  collectively  link  the  as-implemented  architecture 
to  the  as-designed  architecture.  The  as-designed  architecture  consists  of  the  kinds  of  ab¬ 
stractions  used  in  architectural  representations  (subsystems,  high-level  components,  re¬ 
positories,  layers,  conceptually-related  functionality,  and  so  forth).  The  architecture  pat¬ 
terns  explicitly  link  these  abstractions  to  the  information  extracted  in  technique  1 . 

3.  Visualization  of  the  resulting  architecture— the  extracted  information,  as  organized  by  the 
patterns — for  validation  by  the  user. 

Each  of  these  techniques  on  its  own  is  insufficient  to  address  the  problem  of  architectural  ex¬ 
traction:  The  architectural  extraction  and  visualization  techniques  used  here  are  not  in  them¬ 
selves  new.  The  solution  to  “finding”  an  architecture  from  extracted  artifacts  derives  from  the 
synergy  of  the  parts  and  from  our  model  of  user  interaction  through  pattern  matching  and  in¬ 
terpretation.  This  is  the  main  contribution  of  our  work. 

In  this  paper  we  describe  how  Dali  is  used  and  the  kinds  of  insights  you  can  get  from  using  it. 
We  exemplify  its  use  through  assessments  of  two  systems:  VANISH  [Kazman  94],  a  system 
for  prototyping  visualizations,  and  UCMEdit,  a  system  for  creating  and  editing  Buhr-style  use 
case  maps  [Buhr  96]. 

In  summary,  our  goals  with  Dali  are  to 

•  support  architectural  analysis,  which  implies  the  need  to 

•  redocument  architectures 
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2  Hello  Dali:  An  Extraction/Analysis  Workbench 

Because  there  is  a  great  deal  of  variance  in  languages,  architectural  styles,  implementation 
conventions  and  so  forth,  we  believe  that  no  single  collection  of  tools  will  suffice  for  all  archi¬ 
tectural  extraction  and  analysis.  Thus,  in  creating  support  for  extraction  and  analysis  we  have 
created  an  open,  lightweight  “workbench:”  an  environment  that  provides  an  infrastructure  for 
opportunistic  integration  of  a  wide  variety  of  tools  and  techniques.  New  elements  must  be  easy 
to  integrate  into  the  workbench  (openness),  and  such  integration  should  not  unnecessarily  im¬ 
pact  other  elements  of  the  workbench  (dependencies  are  lightweight). 

The  following  sections  describe  the  components  of  the  Dali  workbench,  as  illustrated  in  Figure 
2-1 .  The  workbench  is  first  discussed  in  general  terms,  leaving  elements  of  the  workbench  that 
may  vary  between  applications  (such  as  particular  extraction  techniques,  analysis  tools,  or 
manipulation  techniques)  unspecified.  Section  2.5  then  discusses  a  particular  population  of 
the  workbench. 


Extraction 


2-1  The  Dali  Workbench 


2.1  Concrete  Model  Extraction:  Gathering  Clues 

A  necessary  first  step  in  supporting  the  analysis  and  evaluation  of  a  software  architecture  is 
the  extraction  of  a  concrete  model,  a  representation  of  the  implemented  system.  Such  a  rep¬ 
resentation  contains  a  collection  of  elements  (e.g.,  functions,  files,  variables,  objects),  a  col¬ 
lection  of  relations  between  the  elements  (e.g.,  “function  calls  function,”  “file  contains 
function”),  and  a  set  of  attributes  of  these  elements  and  relations  (e.g.,  “function  calls  function 
A/ times,”  “object  A  has  type  B”).  A  concrete  model  may  reflect  several  views  of  a  system,  such 
as  its  static  structure,  dynamic  (runtime)  nature,  or  build-time  structure. 
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Thor©  ar©  many  t©chniqu©s  and  tools  for  static  sourc©  modal  ©xtraction,  and  thas©  ar©  largaly 
divisibi©  into  two  classas:  thos©  basad  on  parsing  and  thos©  basad  on  laxical  tachniquas.  Lax- 
ical  tachniquas  ar©  usually  mor©  varsatil©  and  lightwaight  than  parsa-basad  tachniquas,  but 
laxical  tachniquas  typically  achiav©  lower  accuracy.  Regardless,  it  is  important  to  appreciate 
that  no  one  too/ will  successfully  extract  a  complete  source  model;  first,  tools  are  designed  to 
extract  particular  source  elements  rather  than  comprehensive  models,  and  second,  extraction 
tools  frequently  produce  output  that  does  not  accurately  reflect  the  source  corpus  [Murphy 
96a]. 

Imperfect  extractors  may  appear  to  be  acceptable  when  considering  systems  from  the  archi¬ 
tectural  perspective:  Why  should  one  missed  function  call  disturb  the  high-level  model?  How¬ 
ever,  this  is  a  dangerous  assumption  as  it  will  not  be  a  single  function  call  that  is  missed,  but 
more  likely  a  whole  class  of  related  elements  or  relations.  This  deficiency  will  likely  affect  the 
architectural  model.  So,  what  are  we  to  do?  We  propose  that  composition  oi  multiple  extraction 
techniques  will  alleviate  these  problems  by  providing  a  concrete  model  of  higher  accuracy  than 
any  individual  technique. 

In  the  simplest  case,  extraction  techniques  will  be  directed  toward  different  views  disjoint 
sets  of  elements  and  relations  (e.g.,  one  technique  for  extraction  of  function  calls  and  another 
for  extraction  of  variable  access).  Composition  is  then  simply  a  matter  of  constructing  the 
union  of  the  concrete  models;  however,  this  does  not  address  the  potential  deficiencies  of  any 
individual  technique.  Composition  of  techniques  that  are  not  intended  to  generate  disjoint 
models  requires  that  one  address  several  issues.  First  and  foremost  is  that  of  contacts  be¬ 
tween  models:  the  situation  when  one  extractor  identifies  a  particular  element  or  relation,  and 
another  extractor  does  not.  The  simplest  (though  potentially  incorrect)  solution  is  to  generate 
a  union  in  this  case  as  well,  resulting  in  a  concrete  model  that  incorporates  false  positives  from 
each  contributing  model.  An  alternate  solution— analogous  with  software  fault  tolerance— is  to 
combine  multiple  overlapping  models  with  “voting,”  where  an  element  or  relation  is  included  in 
the  composite  model  if  it  appears  in  a  majority  of  the  contributing  models,  or  weighted  voting, 
where  the  votes  of  some  models  weigh  heavier  than  others  with  respect  to  specific  extracted 
artifacts. 

This  model  of  composition  of  extraction  techniques  contributes  to  the  open,  lightweight  nature 
of  Dali.  The  method  for  integration  of  a  new  extraction  technique  is  typically  trivial  and  at  worst 
requires  an  analysis  of  the  characteristics  of  the  new  extractor.  Because  extractors  are  igno¬ 
rant  of  each  other,  integration  of  a  new  technique  does  not  imply  that  the  existing  extractors 
need  to  reanalyze  the  source  corpus.  For  these  reasons,  a  single  all-encompassing  extractor 
is  not  necessary:  instead,  individual  lightweight  extractors  are  incorporated  into  the  Dali  work¬ 
bench  opportunistically. 


2.2  A  Repository  Hybrid 

Once  a  concrete  model  is  extracted  it  must  be  stored.  We  considered  two  options  for  this  stor¬ 
age:  use  of  a  database  that  manages  all  access  to  the  model  and  use  of  an  interchange  format 
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that  decentralizes  model  access.  An  interchange  format  provides  more  flexibility  in  a  multitool 
environment,  but  involves  significant  up-front  effort  for  definition.  As  the  intention  of  the  work¬ 
bench  approach  is  to  apply  tools  opportunistically,  the  use  of  a  centralized  database  compli¬ 
cates  model  maintenance;  each  tool  must  have  an  interface  to  the  database  and  be 
responsible  for  updating  the  database  with  any  modifications  that  the  tool  makes.  The  inter¬ 
change  format,  then,  appears  to  be  the  solution.  On  the  other  hand,  a  database  provides 
handy  functionality  with  respect  to  querying,  multiuser  support,  and  history  mechanisms. 

With  Dali,  we  have  adopted  a  hybrid  approach  in  which  we  use  an  SQL  (structured  query  lan¬ 
guage)  database  for  primary  model  storage,  but  application-specific  file  formats  for  inter¬ 
change  between  tools.  Tools  may  thus  either  access  the  database  directly,  via  programmatic 
interfaces  (a  less  open  solution),  or  depend  on  provision  of  data  files  in  appropriate  formats. 
This  scheme  allows  users  to  enjoy  the  advantages  of  having  a  repository,  but  alleviates  the 
burden  of  model  maintenance  by  all  participating  tools.  Once  a  tool  has  manipulated  the  mod¬ 
el,  the  repository  is  updated.  This  approach  is  discussed  further  in  Section  2.4.3. 

2.3  Derived  Relations 

Extraction  of  information  is  the  foundation  of  architecture  reconstruction.  But  it  is  also  impor¬ 
tant  to  augmentXhe  collection  of  relations  stored  by  the  repository.  Figure  2-2  illustrates  an  ex¬ 
ample  from  socket-based  interprocess  communication  (IPC). 


The  circles  represent  extracted  elements  from  the  concrete  model:  functions  (f,  g,  connect, 
and  bind),  files  (x.c  and  y.c),  and  processes  (p  and  q).  Solid  arrows  represent  relations,  la¬ 
belled  with  the  type  of  relation  and,  in  the  case  of  calls,  tagged  with  attributes  (SERVER).  The 
dashed  line,  labelled  communicates_with,  depicts  a  derived  relation.  That  is,  this  relation  did 
not  exist  in  any  of  the  extracted  information — it  had  to  be  inferred  from  a  collection  of  extracted 
relations.  Process  q  acts  a  server  because  it  is  built  from  a  file  containing  a  function  that  calls 
bind,  p  is  a  client  because  it  is  built  from  a  file  containing  a  function  that  calls  connect.  We  know 
that  p  and  q  are  a  client/server  pair  because  the  arguments  to  connect  and  bind  are  identical. 
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Thus,  p  communicates  with  q  and  vice  versa.  This  relationship  could  have  been  defined  asym¬ 
metrically  instead,  for  example,  as  “p  connectsjo  q.” 

Derived  relationships  of  this  type  provide  a  mechanism  for  abstraction  over  the  extracted  arti¬ 
facts,  thus  creating  new  views  of  the  architecture.  For  this  reason,  flexible  relationship  deriva¬ 
tion  is  an  import  feature  in  Dali.  This  functionality  is  facilitated  by  the  off-the-shelf  SQL 
database  that  provides  Dali’s  central  repository  [Stonebraker  90]. 

2.4  Model  Manipulation:  Organizing  the  Evidence 

Perhaps  the  most  important  component  of  the  Dali  workbench  is  its  interaction  element,  the 
central  component  that  is  used  to  directly  manipulate  the  model  and  guide  analyses  and  au¬ 
tomatic  manipulation.  This  component  is  best  realized  by  a  flexible  tool  that  effectively  balanc¬ 
es  generality  with  domain  applicability.  For  example,  a  generic  graphics  package  would 
provide  a  great  deal  of  generality,  but  little  or  no  functionality  specific  to  the  domain  of  soft¬ 
ware.  A  tool  such  as  DISCOVER  [DISCOVER  97],  on  the  other  hand,  provides  significant  do¬ 
main  functionality  but  little  flexibility.  We  have  found  Rigi  [Wong  94]  to  be  a  satisfactory 
compromise  between  these  competing  concerns:  Rigi  provides  generality  via  a  control  lan¬ 
guage  based  on  TCL,  called  RCL  (Rigi  Command  Language).  RCL  also  provides  functionality 
specific  to  the  tasks  of  manipulation  of  software  models,  satisfying  the  domain  applicability 
concern. 

While  Rigi  in  fact  comprises  both  an  extraction  component  (a  parser)  and  a  user  interface  com¬ 
ponent,  and  while  both  of  these  are  applicable  within  the  Dali  workbench,  we  are  currently  only 
using  Rigi’s  user  interface.  The  Rigi  user  interface  (rigiedit)  at  its  most  basic  level  provides 
graph  editing  functionality:  layout,  annotation,  subgraph  collapsing,  and  so  forth.  At  a  higher 
level,  Rigi  provides  semi-automatic  facilities  for  subsystem  identification-based  graph-theoret¬ 
ic  properties  such  as  interconnection  strength.  In  experimenting  with  Rigi’s  more  advanced 
functionality,  we  did  not  find  that  the  automatically  Identified  subsystems  corresponded  with 
the  architectural  components  we  wished  to  identify.  Our  alternative  to  using  automatic  archi¬ 
tectural  discovery  techniques  is  to  allow  direct  manipulation  of  the  model  augmented  by  exter¬ 
nal  manipulation  and  analysis  tools. 

2.4.1  Direct  Manipulation 

Notwithstanding  any  foreseeable  advances  in  automatic  manipulation  or  pattern  matching  of 
software  models,  it  is  necessary  to  provide  a  direct  manipulation  interface  for  users.  Facilities 
such  as  those  mentioned  above  (layout,  annotation,  subgraph  manipulation)  enable  a  user  to 
impose  their  interpretation  of  the  architecture  on  the  model  being  manipulated. 

Designers  have  standard  ways  in  which  they  visually  organize  the  components  of  their  archi¬ 
tecture  to  correspond  to  their  understanding  of  some  property  of  the  system  (e.g.,  top-down 
control  flow  or  bottom-up  data  flow).  So,  it  is  helpful  to  be  able  to  visually  reorganize  the  cur¬ 
rently  visible  set  of  components  arbitrarily  to  reflect  this  understanding. 
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2.4.2  External  Manipulation  and  Anaiysis 

Dali  supports  the  ability  to  apply  tools  to  the  current  architectural  model  opportunistically.  This 
is  achieved  through  a  three-step  process: 

1 .  exporting  the  model  (or  some  subset  thereof)  from  Rigi 

2.  applying  an  appropriate  tool  to  manipulate  or  analyze  the  model 

3.  (optionally)  importing  the  result 

The  simple  example  of  this  process  exists  within  Rigi  by  default:  Rigi  can  export  a  snapshot  of 
its  current  model  in  a  graph  representation  language,  execute  an  external  graph  layout  algo¬ 
rithm,  and  then  import  the  result.  We  have  extended  this  approach  to  incorporate  tools  that 
structurally  alter  the  model.  We  do  this  by  implementing,  in  RCL,  glue  code  that  synchronizes 
Rigi’s  model  with  any  changes  to  the  model  from  the  application  of  external  tools. 

The  central  technique  for  external  model  manipulation  within  Dali  is  based  on  queries  over  the 
SQL  database  that  stores  the  model.  Although  the  SQL  query  language  does  not  provide  suf¬ 
ficient  expressive  power  to  describe  constructs  such  as  the  transitive  closure  of  a  relation,  we 
have  found  that  it  is  sufficient  to  allow  model  manipulation  based  on  structural  relationships  as 
well  as  attributes  of  elements  and  relations.  This  is  because  our  model  of  manipulation  is  iter¬ 
ative,  and  depends  on  the  user  to  define  higher  order  relations. 

These  query-based  manipulations  are  the  basis  for  the  architectural  patterns  that  collectively 
relate  an  as-built  architecture  to  an  as-designed  architecture.  In  addition,  they  provide  the 
foundation  for  one  mechanism  of  architectural  pattern  matching.  These  applications  of  query- 
based  manipulation  will  be  discussed  in  Section  3. 

Qne  goal  of  reasoning  architecturally  about  a  system  is  to  support  the  ability  to  analyze  an  ar¬ 
chitecture  for  properties  such  as  availability,  performance,  security,  schedulability,  and  so 
forth.  The  literature  provides  many  techniques  for  such  analyses  and  many  tools  have  been 
developed  to  perform  them  [Kazman  96a],  [Smith  93].  Dali  provides  a  mechanism  by  which 
analyses  can  be  performed  on  the  architectural  model  currently  being  explored,  using  the  ex- 
port-process-import  model  described  above.  Integration  of  a  new  technique  for  which  an  anal¬ 
ysis  tool  exists  is  simply  a  matter  of  implementing  appropriate  translators  to  produce  a  format 
acceptable  to  the  tool  and  to  interpret  its  results  (if  appropriate).  We  have  done  this  with  Inter¬ 
active  Architecture  Pattern  Recognition  (lAPR)  [Kazman  96c],  a  tool  that  determines  the  pres¬ 
ence  of  patterns  in  a  software  architecture,  and  with  RMTool  [Murphy  95],  a  tool  that  measures 
architectural  conformance.  Each  of  these  generates  output  that  is  viewed  by  other  external 
tools. 

2.4.3  Repository  Synchronization 

The  concrete  model  within  Rigi  is  a  local  ‘^working  copy”  of  the  model  stored  within  the  SQL 
database.  It  is  important  to  recognize  the  potential  difficulties  resulting  from  lack  of  synchroni¬ 
zation  between  this  local  working  version  and  the  central  repository.  The  most  significant  ex- 
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ample  is  the  query-based  manipulation  mechanism  outlined  above,  as  a  query  over  the  SQL 
database  may  manipulate  the  model  in  ways  that  confiict  with  what  is  currently  being  displayed 
by  Rigi. 

Repository  synchronization  is  achieved  through  additional  RCL  code  that  updates  the  SQL  da¬ 
tabase  with  modifications  made  to  the  Rigi  model  (those  made  by  external  tools  or  by  direct 
user  interaction). 

2.5  Tools  of  the  Trade:  Populating  the  Dali  Workbench 

The  Dali  workbench,  as  described,  specifies  only  one  constraint:  the  use  of  a  database  for 
centrai  model  storage.  The  other  elements  of  the  system — extraction  techniques,  methods  for 
combination  of  extracted  data,  the  visualization  system  and  its  interaction,  and  particular  anal¬ 
ysis  or  manipulation  tools — are  left  unspecified.  This  is  in  theory.  In  practice,  we  have  specified 
tools,  as  illustrated  in  the  preceding  discussion  of  Rigi  for  visualization  and  interaction.  We  cur¬ 
rently  populate  the  rest  of  the  workbench  as  follows: 

•  Lightweight  Source  Model  Extraction  (LSME)  [Murphy  96b],  Imagix  [Imagix  97],  make, 
and  Perl  [Wall  91]  for  extraction  of  source  model  information  for  C  and  C++ 

•  gprof  for  extraction  of  dynamic  (profile)  information 

•  PostgreSQL  (based  on  PQSTGRES  [Stonebraker  90])  for  model  storage 

•  lAPR  [Kazman  96c],  RMTool  [Murphy  95],  and  Perl  for  analysis  and  manipulation 

As  Dali  has  an  open,  light  weight  architecture,  replacement  of  any  of  these  components  or  in¬ 
clusion  of  new  components  is  intended,  and  has  proven  to  be,  straightforward. 
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3  Playing  Detective:  Dali  on  the  Streets 

This  section  describes  the  application  of  Dali  to  the  architectural  reconstruction  of  two  sys¬ 
tems,  both  implemented  in  C++:  VANISH  [Kazman  96b],  a  50-KLOC  system  for  prototyping 
visualizations,  and  UCMedit,  a  15-KLOC  system  for  creating  and  editing  Buhr-style  use  case 
maps  [Buhr  96].  We  liken  this  process  to  detective  work;  we  gather  evidence,  pose  hypotheses 
that  organize  the  evidence,  and  view  and  interpret  the  resulting  organization.  We  iterate 
through  this  process  until  we  are  satisfied  with  the  results.  What  does  it  mean  to  be  satisfied 
with  an  architectural  representation?  We  discuss  this  in  Section  4. 

3.1  Extraction 

Extraction  of  static  source  models  for  both  VANISH  and  UCMEdit  was  performed  using  LSME 
[Murphy  96b].  To  apply  LSME,  you  provides  a  set  of  patterns  specified  as  regular  expressions 
and  a  set  of  actions  written  in  the  Icon  programming  language  [Griswold  83].  Actions  are  ex¬ 
ecuted  when  patterns  match  elements  of  a  source  corpus.  To  perform  extraction  of  elements 
from  C++  using  LSME,  a  set  of  patterns  and  actions  were  developed. 

The  elements  and  relations  that  were  extracted  are  shown  in  Table  3-1 . 


3-1  Elements  and  Relations  Extracted  from  VANISH  and  UCMEdit 


Relation 

“From”  Element 

“To”  Element 

Element  Type 

Element  Name 

Element  Type 

Element  Name 

calls 

function 

tCaller 

function 

tCallee 

contains 

file 

tContainer 

function 

Containee 

defines 

file 

tFile 

class 

Class 

has_subclass 

class 

tSuperclass 

class 

tSubclass 

has_friend 

class 

Class 

class 

tFriend 

defmes_fn 

class 

tDefined_by 

function 

tDefines 

has_member 

class 

Class 

member  variable 

tMember 

defines_var 

function 

tDefiner 

local  variable 

tVariable 

has_instance 

class 

Class 

variable 

tVariable 

defines_global 

file 

tDefiner 

global  variable 

tVariable 

It  is  important  to  note  that  variable  accesses  are  not  included  in  Table  3-1 ;  that  is,  there  are  no 
“function  reads  variable”  or  “function  assigns  variable”  relations.  LSME  was  not  designed  with 
these  relations  in  mind.  A  second  extraction  technique,  based  on  the  Imagix  [Imagix  97]  C++ 
parser,  was  incorporated  to  accomplish  the  extraction  of  this  disjoint  set  of  relations.  The  Imag- 
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jx  C++  parser  (a  component  of  the  Imagix  4D  program-understanding  tool)  generates  rich  and 
detailed  output,  including  variable  accesses,  in  any  easy-to-process  format.  A  simple  Perl 
script  post-processes  this  output,  generating  the  relations  of  interest.  Additional  file  depends 
on  file”  relations  are  extracted  by  processing  the  output  from  running  the  GNU  make  utility  on 
the  application’s  makefile. 

Once  the  concrete  model  of  interest  was  extracted,  functions  thought  to  be  “uninteresting” 
were  filtered  out;  among  these  are  built-in  functions,  such  as  return,  and  standard  C  library 
functions  such  as  scant  and  printf.  Next,  an  SQL  database  was  populated  with  the  extracted 
relations.  Two  additional  database  tables,  relations  and  components,  were  defined  for  conve¬ 
nience;  the  former  identifies  all  defined  relation  types,  and  the  latter  identifies  all  defined  com¬ 
ponents.  The  components  table  has  an  additional  field  (called  type)  that  stores  the 
component’s  type  (file,  function,  etc.). 

3.2  There’s  an  Architecture  in  There? 

The  process  of  manipulating  the  concrete  model  to  derive  the  as-built  architecture  of  the  sys¬ 
tem  is  an  iterative,  interactive,  and  interpretive  one.  It  requires  the  interaction  of  not  just  any 
person,  but  of  a  person  familiar  with  the  system.  This  interaction  consists  of  alternating  pat¬ 
tern-definition  and  pattern-recognition  activities. 

The  end  result  will  be  a  representation  of  information  extracted  from  the  as-built  architecture, 
as  organized  by  the  analyst/detective.  Clearly,  it  is  possible  to  group  source  elements  into  ar¬ 
chitectures  in  a  huge  variety  of  ways.  This  is  why  the  task  must  be  done  by  someone  familiar 
with  the  system’s  design.  Recall  that  we  are  interested  in  architectural  conformance,  which 
means  that  an  as-designed  architecture  exists,  even  if  it  is  only  in  the  architect’s  mind. 

But  how  do  you  get  there  from  here?  How  do  you  get  from  a  mountain  of  evidence  to  a  concise, 
accurate  representation  of  the  architecture?  We  illustrate  this  process  by  walking  through  a 
typical  set  of  pattern  applications  in  Dali  that  move  an  analyst  from  the  raw  data  that  is  a  con¬ 
crete  model  to  a  (hopefully)  simple,  elegant  software  architecture. 

Consider  Figure  3-1 ,  which  shows  the  raw  extracted  concrete  model  of  UCMEdit  that  contains 
830  nodes  and  2507  relations.  (The  corresponding  image  for  VANISH  would  not  be  recogniz¬ 
ably  different  with  2844  nodes  and  7387  relations.)  This  is  the  starting  point  of  the  architecture 
reconstruction  process. 
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3-1  A  Raw  Concrete  Model:  White  Noise 


The  following  sections  describe  the  use  of  application-independent  patterns  to  transform  the 
models  of  UCMEdit  and  VANISH  and  discuss  the  use  of  patterns  leveraging  architectural  in¬ 
formation  common  to  both  applications.  The  examples  will  conclude  with  the  application  of  di¬ 
rect  manipulations  and  patterns  specific  to  each  system. 

3.3  Application-Independent  Patterns 

The  first  step  toward  reconstructing  a  system’s  architecture  is  to  apply  several  simple  low-lev- 
el,  application-independent  patterns  to  augment  and  simplify  the  raw  concrete  model  with 
some  derived  information.  Patterns  are  specified  as  sets  of  SQL  queries;  a  Perl  wrapper  acts 
as  the  “glue”  for  importation  of  the  results  into  Rigi.  The  first  pattern  set  is  used  to  identify  the 
types  of  source  elements,  as  shown  in  Figure  3-2.^  A  pattern  set  contains  a  series  of  patterns, 
where  each  pattern  is  made  up  of  an  SQL  query  and  a  Perl  expression.  The  former  selects  a 
set  of  elements  from  the  concrete  model,  and  the  latter  post-processes  the  query,  extracting 
and  manipulating  important  fields.  For  example,  consider  the  second  group  of  patterns  in  Fig¬ 
ure  3-2  that  specifies  the  element  type  “class.”  This  group  is  made  up  of  three  patterns  used 
to  identify  sets  of  elements  that  are  classes.  These  sets  may  or  may  not  overlap  with  each  oth¬ 
er. 


It  should  be  noted  that  this  step  would  not  necessarily  be  required  for  all  extracted  concrete  models.  Some 
extracted  concrete  models  would  already  contain  typing  information. 
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The  first  of  the  class  patterns  selects  all  “definers”  from  the  def  ines_f  n  relation.  That  is,  any 
element  that  is  the  definer  in  a  “defines  function”  relation  is  a  class.  The  second  pattern  selects 
all  components  that  “have  instances,”  and  the  third  selects  all  components  that  are  either  a 
superclass  or  a  subclass.  The  Perl  expressions  associated  with  each  of  these  patterns  simply 
generate  output  of  the  form:  “null  <coinponent>  class,”  identifying  <coinponent>  as  a 
Class .  (The  first  field  will  be  discussed  further  below.)  The  output  is  processed  by  Rigi  and 
used  to  update  its  internal  model. 


#  Pile  type. 

SELECT  tName 

FR(^  components 

#  Member  variable. 

SELECT  DISTINCT  hl.tMember 

FROM  ha8_meinber  hi; 

WHERE  tName  LIKE 

OR  tName  LIKE  '%.cc'; 

print  "null  $fields[0]  MemberVaribleXn" ; 

print  ''null  $fieldB[0]  FileX"; 

#  Local  variable. 

SELECT  DISTINCT  dl.tVariable 

#  Class  type. 

SELECT  DISTINCT  dl . tDef ined_by 

FROM  defines_var  dl; 

FROM  defines_fn  dl; 

print  "null  $fieldsCO]  LocalVariable\n"; 

print  "null  $£ields[0]  Class Vn"; 

#  Global  variable. 

SELECT  DISTINCT  dl.tVariable 

SEIJ5CT  DISTINCT  il.tVariablo 

FROM  defines_global  dl; 

FROM  has_instanco  il; 

print  "null  $fields[0]  GlobalVariableXn"; 

print  "null  $fields[0]  ClassXn"; 

SELECT  DISTIITCT  cl.tNaxae 

FROM  components  cl,  has_subclass  si 

WHERE  cl. tName- 8 l.t Superclass 

OR  cl.tName-sl.tSubclass; 

print  "null  $£ields[0]  ClassNn"; 

3-2  Patterns  for  Element  Types 


Any  elements  that  have  not  been  assigned  a  type  by  this  pattern  set  will  retain  Rigi’s  default 
type  of  “Function.” 

At  this  point,  we  have  a  database  synchronization  problem,  as  discussed  in  Section  2.4.3;  the 
type  information  that  we  have  just  derived  is  stored  only  in  the  Rigi  model,  not  in  the  SQL  da¬ 
tabase.  To  synchronize  the  two  models,  a  Rigi  RCL  script  is  executed  to  export  the  derived 
types  back  into  the  database. 

Thus  far,  the  concrete  model  as  displayed  in  Rigi  will  be  indistinguishable  from  Figure  3-1 .  We 
have  not  yet  made  any  structural  modifications  to  the  model;  we  have  only  identified  the  types: 
Function,  File,  Class,  LocalVariable  and  GlobalVarieOile.  The  next  collection  of 
low-level  patterns  applied  to  the  model  group  together  functions  with  any  variables  that  they 
declare  locally. 

Figure  3-3  shows  the  pattern  set  for  function  aggregation.  This  pattern  set  has  the  effect  of 
incorporating  a  function  and  all  of  the  local  variables  that  it  defines  into  a  new  composite  com¬ 
ponent.  This  new  component  has  the  original  function’s  name  appended  with  a  ‘-i-’.  The  follow¬ 
ing  Perl  expression  specifies  the  aggregation: 

print  "$fields[0]+  $fields[0]  Function\n"; 
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In  this  expression,  $fields[0]+  identifies  the  name  of  the  new  composite;  $fieids  [0]  is 
the  name  of  the  original  function;  Function  is  the  type  of  the  new  composite. 


#  Function  grouping. 

SELECT  tName 

FROM  components 
WHERE  t Types' Function'; 

print  "$fields[0]+  $fields[0]  Function\n"; 

SELECT  dl.tDefiner,  dl.tVariable 
FROM  defines_var  dl; 

print  "$fioldsI0]+  $fields[l]  FunctionXn"; 


3-3  Patterns  for  Function  Aggregation 

After  this  pattern  set  is  applied,  the  models  for  UCMEdit  and  VANISH  sf/// appear  as  inscruta¬ 
ble  webs  of  nodes  and  arcs.  However,  they  are  simpler  than  the  concrete  model  of  Figure  3- 
1 ,  prior  to  the  application  of  the  function  aggregation  patterns.  The  UCMEdit  model  now  shows 
71 0  nodes  and  2321  relations,  and  the  VANISH  model  shows  2282  nodes  and  6586  relations. 

The  next  low-level  pattern  set  applied  is  similar  in  nature  to  that  for  collapsing  functions,  but 
generates  a  much  more  significant  visual  effect.  This  pattern  set  collapses  together  classes 
and  their  member  variables  and  functions,  representing  them  as  a  single  node.  The  resulting 
concrete  model  for  UCMEdit  is  shown  in  Figure  3-4;  it  contains  233  nodes  and  518  arcs. 
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3-4  The  UCMEdit  Concrete  Model  After  Collapsing  Classes 

The  VANISH  concrete  model  now  contains  798  nodes  and  2359  arcs.  The  dramatic  simplifi¬ 
cation  of  the  models  is  one  that  should  be  expected  after  application  of  these  patterns  to  ob¬ 
ject-oriented  systems.  By  virtue  of  the  fact  that  there  are  still  elements  that  are  not  related  to 
any  class  in  the  concrete  model,  we  have  already  exposed  either  a  deficiency  in  the  extractors 
applied,  or  ways  in  which  these  systems  deviate  from  pure  object-oriented  designs.  In  fact, 
each  of  these  cases  has  occurred. 

There  are  false  positives  generated  by  the  LSME  extraction  patterns  in  the  form  of  apparent 
calls  to  global  functions  that  are  actually  calls  to  member  functions,  and  there  are  several  func¬ 
tions  that  are  indeed  global  functions,  belonging  to  no  class  defined  in  the  system.  Of  course, 
some  global  functions,  in  the  form  of  system  calls  or  windowing  system  primitives,  are  neces¬ 
sary.  How  these  “leftover”  cases  are  separated  from  the  rest  of  the  architecture  is  discussed 
below. 

The  models  for  UCMEdit  and  VANISH  are  now  collections  of  files,  classes,  leftover  functions, 
and  global  variables.  Local  variables  have  been  incorporated  into  the  functions  in  which  they 
are  defined,  and  member  functions  and  member  variables  have  been  incorporated  into  their 
associated  classes.  At  this  point  we  can  compose  global  variables  and  functions  into  the  files 
in  which  they  are  defined,  in  much  the  same  manner  as  functions  and  classes  were  composed. 
The  resulting  models,  shown  in  Figure  3-5,  contain  three  separate  groups  of  elements:  files, 
classes,  and  the  remaining  leftover  functions. 
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3-5  The  UCMEdit  and  VANISH  Models  Showing  (From  Top  to  Bottom)  Classes,  Files  and 

“Leftover”  Functions  (Arcs  Are  Hidden) 

3.4  Common  Application  Patterns 

Until  now,  each  of  the  pattern  sets  applied  has  been  application-independent,  but  specific  to 
the  extraction  techniques  and  to  the  domain  of  C++  software.  The  next  sets  of  patterns  to  be 
applied  use  expert  knowledge  of  the  UCMedit  and  VANISH  architectures.  At  this  point  the  re¬ 
construction  process  diverges  from  a  rote  analysis,  where  we  apply  off-the-shelf  patterns,  into 
opportunistic  pattern  recognition  and  definition,  leveraging  the  kinds  of  information  that  a  de¬ 
signer  or  experienced  system  programmer  could  be  expected  to  know  about  a  specific  sys¬ 
tem’s  architecture. 
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The  first  application-specific  knowledge  that  we  apply  to  our  example  systems  Is  as  follows: 

•  They  are  both  interactive,  graphical  appiications. 

.  They  both  attempt  to  enoapsulate  access  to  the  underlying  windowing  and  graphics 
subsystem  within  a  layer. 

.  The  functions  oomprising  the  graphics  libraries  used  (Xlib,  XForms,  and  Mesa)  have 
characteristic  naming  conventions. 

These  observations  lead  to  the  pattern  set  shown  In  Figure  3-6,  which  Is  intended  to  identify 
the  graphics  subsystem,  those  external  functions  providing  rendering  and  interaction  function- 
aTh-  tlTapplicItion.  (The  paherns  shown  are  for  UCMEdit;  me  VANISH  Pa«-ns  am  on 
slightly  dIHerent  due  to  a  more  elaborate  encapsulation  of  the  graphics  layer.)  Consider  t 
fir^^l  pittem:  It  first  constructs  a  new  table  from  the  cowonents  table  ‘’V  ® 

tions  that  are  members  of  classes  (those  that  appear  as  the  tDe£in.s  fie  d  in  a  tuple  of  the 
ae£ln,._£n  relation).  Then  the  pattern  selects  from  this  new  table  ® 

called  by  functions  defined  by  subclasses  of  the  er..ent.txon  class  Pf 

references  subclasses  of  the  presentation  class.  In  doing  so,  it  imp  ici  y  i 
er  that  the  original  designers  created  to  encapsulate  accesses  to  the  graphics  subsystem, 
information  will  be  leveraged  further  below.  The  second,  third,  and  fourth  patterns  in  this  pM- 
re"dentify  functions  defined  by  the  Mesa.  XForms.  and  Xlib  libraries,  respectively,  by 

Specifying  patterns  over  the  function  names. 


#  1:  Identify  calls  from  graphics  access 

#  layer. 

DROP  TABLE  tmp; 

SELECT  *  INTO  TABLE  tmp 
FROM  components; 

DELETE  PROM  tmp 

WHERE  tmp.tName=def ines_fn.tDef ines; 

SELECT  tl.tName 

FROM  tmp  tlf  calls  cl,  defines_fn  dl, 
has_subclass  si,  has_subcla8s  b2 
WHERE  tl.tName=cl-tCallee 
AMD  cl.tCaller=dl.tDefines 
and  dl.tDofined_by=sl.tSubcla8S 
and  si  .tSuperclassss' Presentation ' ; 

print  "Graphics  $£ields[03+  nullXn"; 

#  2 :  Identify  calls  to  Mesa  functions • 
SELECT  tName 

FROM  components 
WHERE  tType=' Function' 
and  tName  LIKE  'gl%'; 

print  "Graphics  $fields[03+  null\n"; 


#  3:  Identify  calls  to  XForms  functions. 
SELECT  tName 

FROM  components 
WHERE  tType=' Function' 
and  tName  LIKE  'fl__H'; 

print  "Graphics  $fields[0]+  null\n"; 

#  4:  Identify  calls  to  Xlib  functions. 
DROP  TABLE  tmp; 

SELECT  ♦  INTO  TABLE  tmp 
FROM  components; 

DELETE  FROM  tmp 

WHERE  tmp.tNamessdef ines_fn.tDef ines; 

SELECT  cl. tName 
FROM  tmp  cl 

WHERE  tType=' Function' 
and  tName  LIKE  'X%'; 

print  "Graphics  $fields[0]+  null\n"; 


3-6  Patterns  for  UCMEdit  Graphics  Subsystem 
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These  patterns  collectively  identify  an  architectural  component,  Graphics.  This  component 
does  not  exist  in  the  extracted  information,  but  it  does  exist  in  the  as-designed  architecture. 
This  is  an  example  of  linking  the  as-built  and  as-designed  architectures  through  a  cumulative 
series  of  pattern  applications.  The  results  of  the  application  of  this  pattern  set  to  the  UCMEdit 
model  are  shown  in  Figure  3-7. 

Note  that  the  names  of  the  elements  to  be  aggregated  into  the  Graphics  component  include 
the  that  was  appended  by  the  patterns  in  Figure  3-3.  This  technique  thus  refers  to  previ¬ 
ously  constructed  composite  elements  without  the  patterns  explicitly  querying  the  database  for 
the  composites.  An  alternative  approach  for  synchronizing  the  database  with  the  interaction 
component  is  to  populate  the  database  with  relations  reflecting  the  compositions  and  include 
these  new  relations  in  the  pattern  queries.  We  chose  to  avoid  this  alternative  for  efficiency  con¬ 
siderations,  but  it  is  supported  equally  well  by  Dali. 
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3>7  UCMEdit  Model  Showing  the  Graphics  Subsystem,  Classes,  Files,  and  Remaining 

Functions  (Arcs  Are  Hidden) 

Examining  Figure  3-7,  we  see  that  there  are  only  two  leftover  functions  remaining;  f  abs  and 
[] :  the  latter  is  obviously  an  extraction  error  while  the  former  is  a  math  library  function  that 
should  have  been  filtered  out  along  with  standard  C  library  and  built-in  functions.  Regardless, 
neither  of  these  functions  is  of  interest,  so  they  can  be  pruned  from  the  model.  Identifying  the 
graphics  subsystem  for  the  VANISH  model  has  reduced  the  number  of  leftover  functions  by 
more  than  half,  from  259  to  120.  These  were  similarly  identified  as  either  extraction  errors  or 
uninteresting  low-level  functions  and  pruned  from  the  model. 

Rather  than  identifying  uninteresting  functions  as  a  side-effect  of  creating  the  graphics  sub¬ 
system,  a  more  direct  approach  could  have  been  applied.  Such  an  approach  would  simply 
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have  selected  all  functions  not  contained  within  a  source  file  of  the  system  or  defined  by  a 
class  of  the  system  and  removed  them  from  the  model. 

It  is  important  to  realize  that  the  determination  of  which  functions  are  “interesting”  or  “uninter¬ 
esting”  is  an  arbitrary  one.  An  analyst  interested  in  a  different  aspect  of  the  system,  such  as 
how  its  subsystems  depend  on  platform-  or  operating  system-specific  libraries,  would  not  have 
pruned  these  functions  from  the  concrete  model.  These  functions  would  more  likely  be  aggre¬ 
gated  into  a  layer  to  analyze  how  they  are  used  by  the  rest  of  the  application.  As  we  are  inter¬ 
ested  in  constructing  an  architectural  representation  of  the  application-specific  part  of  the 
system,  we  remove  these  types  of  functions  from  the  model. 

A  second  common  application  pattern  set  takes  advantage  of  knowledge  about  the  relation¬ 
ship  between  classes  and  files  in  the  example  applications,  thus  bridging  two  architectural 
views.  First,  a  source  (.cc)  will  contain  functions  for  at  most  one  class,  and  second,  a  header 
( .h)  file  will  contain  a  definition  for  at  most  one  class.  This  makes  it  possible  to  define  a  unique 
containment  relationship;  A  class  can  include  the  header  file  in  which  it  is  defined  and  the 
source  file  which  contains  its  functions.  The  pattern  set  that  generates  these  aggregations  is 
shown  in  Figure  3-8. 


SELECT  DISTINCT  tDefined_by 
PROM  defines_fn; 

print  "$fields[0]+  $fields[0]+  Class  $f ields [0] ++\n"; 

SELECT  DISTINCT  dl .t Define d_by,  cl .tContainer 
FROM  defines_fn  dl,  contains  cl 
WHERE  cl.tContainee=dl.tDefines; 

print  "$fields[0]+  $fields[l]+  ClassNn"; 

SELECT  dl.tClasS/  dl.tPile 
FROM  defines  dl; 

print  ''$fields[0]+  $fields[l]  ClassXn*"; 


3-8  Patterns  for  Class/File  Containment 

We  see  one  additional  feature  of  pattern  specifications  in  this  example:  The  last  field  in  the 
Perl  expression  associated  with  the  first  pattern  ($£ield8  tO]  ++)  specifies  a  renaming  of  the 
component  being  incorporated  into  an  aggregate.  In  this  pattern,  we  are  incorporating  classes 
(named  with  trailing  ‘-i-’s  due  to  the  class-collapsing  patterns  of  Section  3.3)  into  new  compos¬ 
ite  components.  The  names  of  the  new  composites  are  <class>+;  the  original  class  compos¬ 
ites  are  renamed  <class>++.  The  results  are  shown  in  Figure  3-9. 
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3-9  The  UCMEdit  and  VANISH  Models  After  Application  of  Common  Patterns 


3.5  Application-Specific  Patterns 

The  patterns  applied  in  Section  3.3  were  completely  application  independent,  while  those  ap¬ 
plied  in  Section  3.4  were  applicable  to  both  systems  because  of  commonalities  in  their  con¬ 
struction.  At  this  point,  the  analyses  of  UCMEdit  and  VANISH  diverge  as  we  apply  patterns 
specific  to  each  application. 

3.5.1  UCMEdit 

UCMEdit  was  constructed  as  a  prototype,  intended  to  demonstrate  the  advantages  of  comput¬ 
er-based  editing  of  use  case  maps.  The  high-level  architectural  design  of  the  application  was 
not  considered  at  the  start  of  development;  thus,  identification  of  architectural  components 
from  the  concrete  model  must  be  guided  by  an  understanding  of  the  application’s  structure  as 
it  stands  at  the  completion  of  development.  Our  understanding  of  the  application  will  be  im¬ 
posed  on  the  model  via  direct  manipulation,  as  follows. 

First,  we  know  (and  can  tell  by  observation  of  the  model)  that  callbacks.cc  is  central  to 
the  structure  of  the  application,  containing  all  of  the  system’s  event  handlers  and  the  bulk  of 
the  user  interface  implementation.  Second,  we  can  observe  the  obvious  relationships  between 
the  two  remaining  files  and  the  classes  to  which  they  are  connected;  interpolate.ee  is 
associated  exclusively  with  BSpiine,  and  fisheye.ee  is  used  only  by  Box  and  compo¬ 
nent.  Third,  we  may  now  re-apply  our  knowledge  of  the  structure  of  the  system’s  graphics- 
encapsulation  or  presentation  layer;  it  is  embodied  in  the  Presentation  class  and  its  sub¬ 
classes.  Fourth,  we  can  make  the  observation  that  the  List,  Listitem,  and  Listitera- 
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tor  classes  are  functionally  related  to  one  another  and  are  used  by  almost  all  of  the  other 
classes. 

We  realize  the  above  observations  in  Dali  by 

•  identifying  the  callbacks. cc  file  with  an  architectural  component,  interaction 

•  incorporating  interpolate .  cc  into  the  BSpline  component  (we’ll  ignore  the 
observation  about  fisheye.cc  for  now) 

•  aggregating  the  Presentation  class  and  its  subclasses  into  a  Presentation 
component 

•  aggregating  the  List,  Listitem  and  Listiterator  classes  into  a  List  component 
and  hiding  it,  treating  it  as  a  “utility  layer” 

The  results  of  these  changes  to  the  model  are  shown  in  Figure  3-10.  At  this  point,  it  is  neces¬ 
sary  to  carefully  consider  how  we  may  further  simplify  this  model.  Automatic  clustering  based 
on  graph-theoretic  properties,  such  as  interconnection  strength,  does  not  provide  any  insight. 
Another  option  is  to  attempt  to  build  layers  based  on  the  organization  generated  by  the  graph 
layout  algorithm,  as  shown  in  Figure  3-10.  However,  this  approach  results  in  little  functional 
consistency  within  the  layers.  Instead,  we  chose  to  cluster  classes  based  on  the  domain  of  use 
case  maps.  Further  discussion  addressing  how  to  choose  appropriate  architectural  compo¬ 
nents  appears  in  Section  4. 


After  considering  concepts  from  use  case  maps,  we  identified  two  broad  categories  of  ele¬ 
ments;  those  related  to  components  and  those  related  to  paths,  these  being  the  two  primary 
constructs  comprising  a  use  case  map.  DynamicArrow,  Path,  Point,  Responsibility, 
Segment,  Stub,  and  BSpline  are  related  to  paths;  box,  component.  Dependent,  Han¬ 
dle,  and  f  isheye .  cc  are  related  to  components.  Figure  3-1 1  shows  the  effect  of  clustering 
these  elements  into  two  architectural  components:  Path  and  component.  In  probing  the  con- 


22 


CMU/SEI-97-TR-010 


nections  between  elements,  we  find  that  there  are  still  a  large  number  of  interrelationships. 
While  this  is  not  necessarily  harmful  in  itself,  it  suggests  that  UCMEdit’s  architecture  demon¬ 
strates  a  lack  of  functional  consistency  within  the  elements  and  their  connections. 


Unfortunately,  there  are  no  significant  improvements  we  can  make  to  the  UCMEdit  model.  The 
system  was  not  well  designed — ^the  mapping  from  functionality  to  software  structure  is  com¬ 
plex.  This  makes  the  abstraction  of  functionally  coherent  high-level  components  within  UC¬ 
MEdit’s  architecture  impossible.  However,  we  can  take  advantage  of  what  we  have  learned  to 
suggest  improvements  to  the  UCMEdit  design. 

3.5.2  VANISH 

With  VANISH  we  have  a  different  situation:  It  was  developed  following  an  explicitly  document¬ 
ed  architectural  design  [Kazman  96b].  VANISH  is  a  system  for  prototyping  visualizations  and 
as  such  must  easily  accommodate  incorporation  of  new  visualization  domains  as  well  as  inte¬ 
gration  of  new  presentation  toolkits.  The  Arch  metamodel  of  interactive  software  [UIMS  92]  is 
intended  to  provide  exactly  these  benefits  by  specifying  five  architectural  components: 

•  Functional  Core  -  the  system’s  core  functionality  or  purpose 

•  Functional  Core  Adapter  -  a  mediator  between  the  dialogue  and  functional  core  by 
providing  a  unified,  generic  view  of  the  functional  core  to  the  dialogue 

•  Dialogue  -  a  programmable  mediator  between  domain  specific  and  presentation  specific 
functions 

•  Logical  Interaction  -  a  virtual  interaction  toolkit  layer  that  mediates  between  the  dialogue 
and  the  presentation 

•  Presentation  -  the  toolkits  that  implement  the  physical  Interaction  between  the  user  and 
the  application 
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These  components  are  arranged  in  a  strictly  layered  fashion,  as  shown  in  Figure  3-12.  The 
Arch  metamodel  in  fact  loosely  defines  an  architectural  style:  a  collection  of  component  types 
and  a  set  of  constraints  on  their  relationships.  VANISH’S  architecture  is  a  particular  instantia¬ 
tion  of  this  style.  Thus,  we  should  be  able  to  identify  these  architectural  components  within  the 
implemented  VANISH  system  and,  taking  an  optimistic  attitude,  verify  that  the  as-built  archi¬ 
tecture  conforms  to  the  as-designed  architecture. 


3-12  The  Arch  Metamodel  of  Interactive  Software 


The  first  step  towards  uncovering  VANISH’S  architecture  is  consideration  of  the  files  remaining 
in  the  model,  as  shown  on  the  right  side  of  Figure  3-9.  Because  we  know  that  VANISH  has  a 
single-process  architecture,  and  we  know  the  name  of  the  executable  file,  we  can  determine 
which  of  the  remaining  files  are  “interesting”  by  applying  the  depends_on  relation  extracted 
from  the  system’s  makefile.  Source  files  that  do  not  contribute  to  the  construction  of  the  exe¬ 
cutable  can  be  removed  from  the  model.  Examination  of  these  unused  files  identifies  them  as 
either  “dead  code,”  elements  of  the  application’s  earlier  versions  or  tools  used  to  test  particular 
aspects  of  the  application’s  functionality.  This  pruning  is  an  example  of  how  one  architectural 
view  can  be  used  to  constrain  another.  Figure  3-13  shows  the  VANISH  model  after  the  removal 
of  unused  files. 
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3-13  VANISH  Model  After  Removal  of  Unused  Files 

The  two  remaining  files  are  interpolate .  cc  and  vanish-xforms .  cc;  the  former  contains 
global  functions  used  exclusively  by  the  BSpline  class  and  the  latter  is  the  main  initialization 
and  event-handling  component  of  the  application.  Direct  manipulation  is  used  to  include  in¬ 
terpolate,  cc  in  the  BSpline  aggregate. 

Note  that  this  technique  would  provide  additional  insight  when  manipulating  a  multiprocess 
system:  candidate  processes  could  be  identified  from  analysis  of  the  depends_on  relation 
and  used  as  top-level  architectural  components.  Thus,  a  runtime  “process  view”  could  be  de¬ 
veloped. 

Now  we  are  ready  to  apply  the  pattern  set  that  identifies  the  top-level  architectural  components 
in  VANISH;  it  appears  in  Figure  3-14  and  defines  architectural  components  as  follows: 

•  The  Logical  Interaction  component  is  composed  of  the  Presentation  class,  its 
subclasses,  and  two  generic  interaction-related  utility  classes:  BSpline  and  Colour. 

•  The  Presentation  component  is  composed  of  classes  that  have  a  superclass  whose 
name  properly  contains  the  substring  “Presentation.” 

•  The  Functional  Core  Adapter  is  composed  of  the  BaseNode  class  and  the 
BaseAttribute  Class  and  its  subclasses. 

•  The  Functional  Core  is  composed  of  subclasses  of  the  BaseNode  class. 
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•  The  Dialogue  is  more  complex.  In  VANISH,  it  is  realized  by  a  visual  programming 
language.  The  first  pattern  for  the  Dialogue  component  specifies  that  all  subclasses  of 
the  Primitiveop  class  are  included.  The  second  pattern  enumerates  the  other 
elements  that  make  up  the  visual  programming  language.  This  enumeration  was 
developed  by  iterative  application  of  this  second  pattern,  starting  with  an  initial  guess  at 
which  classes  should  be  included  and  refining  the  enumeration  as  additional  classes  were 

identified . 

•  A  final  utilities  layer  is  composed  of  a  set  of  generic  list  manipulation  ciasses  used 
by  many  other  classes  in  the  system. 


SELECT  tSubclass 

FROM  has_8ubclass 

WHERE  tSuperclass-'ProBentation' ; 

print  ^Logical_Intaraction  $fields[0]+  null\n^; 

SELECT  tName 

PROM  components 

WHERE  tName- 'Presentation' 

OR  tName- 'BSpline' 

OR  tName- ' Colour ' ; 

print  ''Logical_Interaction  $£iolds[0]+  nullXn"; 

SELECT  si. tSubclass 

FROM  has.subclass  si 

WHERE  si. t Superclass  -  'Presentation' 

AND  si .tSuperclass  I-  ' '^Presentations '  ; 

print  "Presentation  SfieldstO]-*-  null\n"; 

SELECT  tName 

FRC^  ccm^onents 
WHERE  tName- 'BaseNode' 

OR  tName- ' BaseAttribute ' ; 

print  "Functional_Core_Adapter  $fieldsC01+  null\n"; 

SELECT  tSubclass 

FROM  has^subclass 

WHERE  tSuperclass- ' BaseAttribute ' ; 

print  "Functional_Core_Adapter  $fioldsC0]+  null\n"; 

SELECT  tSubclass 

FROM  has_subclass 

WHERE  tSuperclass- 'BaseNode'; 

print  "Functional_Core  $£ieldst0]4>  null\n"; 


SELECT  tSubclass 

FROM  has_8ubclass 

WHERE  tSuperclass* ' Primitiveop ' ; 

print  "Dialogue  $f ields [0] +  null\n"; 

SELECT  tName 

FROM  c<maponents 

WHERE  tName- 'vanish-xforms.cc' 

OR  tName- 'Primitiveop' 

OR  tName- 'Mapping' 

OR  tName- 'MappingEdi tor' 

OR  tName- 'MappingLibrary' 

OR  tName- 'Attributes' 

OR  tName- 'Application' 

OR  tName- 'Renderer' 

OR  tName- ' Input Value ' 

OR  tName- ' Point ' 

OR  tName- 'VEC' 

OR  tName- 'MAT' 

OR  ((tName  -  'Dbg$'  OR  tName  -  'Event$') 
AND  tType-'Class'); 

print  "Dialogue  $field8[0]+  null\n"; 

SELECT  tName 

FROM  components 
WHERE  (tName  -  '^List' 

OR  tName- 'Map$'  OR  tName- ' Socket ' ) 

AND  tType-'Class'; 

print  "Utilities  $fields[0]+  null\n"; 


3-14  Patterns  for  VANISH  Architecture 

Also,  the  Graphics  component  identified  in  Section  3.4  is  incorporated  into  the  presenta¬ 
tion  component.  The  resulting  model,  with  the  utilities  component  both  shown  and  hid¬ 
den,  is  presented  in  Figure  3-15. 
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3-15  The  VANISH  Architecture  (With  and  Without  Utilities  Layer) 

We  are  now  in  a  position  to  consider  how  well  VANISH’S  as-built  architecture  conforms  to  its 
as-designed  architecture.  We  can  see  immediately  that  the  basic  arch  shape  is  present;  there 
are  no  connections  between  components  on  opposite  sides  of  the  arch.  However,  it  is  also  im¬ 
mediately  apparent  that  there  are  several  instances  of  layer  bridging  in  the  architecture:  be¬ 
tween  the  Dialogue  and  the  Functional  Core,  and  between  the  Dialogue  and  the 
Presentation.  These  are  architectural  deviations.  We  can  identify  three  classes  of  devia¬ 
tions: 

•  acceptable  -  Although  the  architecture  was  not  designed  with  a  particular  feature,  the 
feature  does  not  degrade  the  conceptual  integrity  [Brooks  75]  of  the  architecture.  The 
architectural  description  against  which  conformance  is  being  tested  does  not  need  to  be 
updated  to  reflect  the  feature. 

•  exceptions  -  As  for  acceptable  deviations,  exceptions  do  not  degrade  the  conceptual 
integrity  of  the  architecture.  Exceptions  should  be  incorporated  into  the  model  against 
which  conformance  is  being  tested. 

•  opportunities  for  improvement  -  The  architecture’s  conceptual  integrity  is  degraded  by 
the  deviation.  The  implementation  should  be  modified  to  remove  it. 

We  can  consider  the  existence  of  the  utilities  layer  an  acceptable  deviation:  It  does  not 
affect  the  conceptual  integrity  of  the  intended  architecture,  nor  does  it  contribute  to  its  overall 
structure.  For  this  reason  it  should  be  not  incorporated  into  the  architectural  description. 

The  connections  between  the  Presentation  and  the  Dialogue  are  exceptions  that  should 
be  documented  in  the  system’s  architectural  description.  The  connection  from  the  Presen¬ 
tation  to  the  Dialogue  represents  classes  in  the  Dialogue  directly  instantiating  classes 
in  the  Presentation  (has_iristance  relations).  This  appears  to  be  a  violation  of  the  con¬ 
straints  of  the  Arch  model.  However,  this  violation  is  necessary  due  to  the  way  the  Logical  In- 
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teraction  was  implemented  in  VANISH.  The  Logical  interaction  layer  (the 
Presentation  class  and  its  immediate  subclasses)  provides  an  abstract  interface  to  the 
concrete  classes  in  the  Presentation  component— AmPresentation,  MIFPresenta- 
tion,  xf Presentation,  and  SO  forth  (see  Figure  3-16).  For  the  Dialogue  to  take  advan¬ 
tage  of  this  abstraction  layer,  it  must  directly  create  instances  of  the  concrete  classes. 
Thereafter  it  refers  to  an  abstract  class  from  the  Logical  Interaction  layer,  and  these  references 
are  resolved  via  polymorphism  into  calls  to  the  appropriate  concrete  class.  Therefore,  the  ar¬ 
chitectural  implications  of  this  layer  bridging  are  constrained. 


3-16  Presentation  and  Its  has_subclass  Descendants 

The  connection  from  the  Dialogue  to  the  Presentation  represents  calls  to  the  construc¬ 
tors  of  the  classes  mentioned  above.  This,  then,  is  an  exception  for  the  same  reasons. 

The  structure  underlying  the  Functional  Core  Adapter  and  the  Functional  Core  is 
much  like  that  of  the  Logical  interaction  and  the  Presentation;  the  BaseNode  class 
(in  the  Functional  Core  Adapter)  provides  an  abstract  interface  to  the  concrete  classes 
that  make  up  the  Functional  Core.  Thus,  connection  from  the  Dialogue  to  the  Func¬ 
tional  Core  is  identical  in  nature  to  that  from  the  Dialogue  to  the  Presentation;  it  is 
made  up  of  calls  to  constructors  of  the  concrete  classes.  This  is  also  a  justifiable  architectural 
exception. 

Finally,  the  connection  from  the  Functional  core  to  the  Dialogue  must  be  examined.  One 
might  think  that  it  is  analogous  to  the  connection  from  the  Presentation  to  the  Dialogue, 
for  the  reasons  described  above.  However,  probing  the  connection  shows  that  the  relations 
that  form  it  are  nothas_instance  relations  because  no  class  within  the  Dialogue  maintains 
an  instance  of  a  class  from  the  Presentation.  Instead,  these  relations  are  calls  from  classes 
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in  the  Functional  Core  to  classes  in  the  Dialogue.  These  calls  expose  the  Functional 
Core  to  the  details  of  the  Dialogue,  bypassing  the  stated  intent  of  the  Functional  core 
Adapter:  to  keep  these  components  isolated.  This  is  an  opportunity  for  improvement  as  it  de¬ 
grades  the  conceptual  integrity  of  the  connection  between  the  Dialogue  and  the  Functional 
Core  Adapter.  Also,  unnecessary  coupling  between  the  components  will  hamper  integration 
of  new  visualization  domains,  which  is  central  to  the  stated  objectives  for  VANISH. 

Analysis  of  architectural  conformance,  such  as  that  performed  above,  can  be  supported  by  an 
automatic  tool  such  as  RMTool  [Murphy  95].  We  use  RMTool  by  giving  it  a  description  of  an 
architectural  representation  of  a  concrete  model  (as  reconstructed  using  Dali)  as  well  as  an 
as-designed  architecture,  and  it  computes  the  conformance  of  the  former  to  the  latter.  This 
provides  a  documented  starting  point  for  examining  of  architectural  deviations. 
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4  Assessing  the  Evidence:  Why  Not  Any  Four  Boxes? 

What  does  it  mean  to  reconstruct  the  software  architecture  of  a  system?  We  started  off  this 
paper  with  the  claim  that  software  architecture  was,  in  some  ways,  a  mass  hallucination,  that 
it  does  not  really  exist  in  anything  that  you  can  examine  directly.  That  is,  the  abstractions  that 
are  commonly  used  in  architectural  representations  are  typically  not  those  found  in  the  sys¬ 
tem’s  source  artifacts.  Software  designers  create  an  architecture  and  then  hope  that  what  is 
implemented  properly  reflects  what  was  designed. 

The  problem  for  a  software  analyst  attempting  to  understand  and  assess  an  architecture  is 
therefore  one  of  being  a  software  detective:  sifting  through  clues  and  putting  evidence  together 
in  coherent  patterns.  To  use  a  different  metaphor,  it  seems  that  you  could  mold  any  number 
of  software  architectures  out  of  the  amorphous  clay  that  is  a  concrete  model.  So,  why  can’t  we 
just  recursively  group  extracted  Information  together  until  we  have  achieved  a  picture  with  four 
boxes  and  a  few  connections? 

How  do  we  know  what  is  a  good,  meaningful  architecture?  This  is  really  asking  two  questions: 

•  What  makes  a  “good”  architecture? 

•  How  do  we  know  when  we’ve  found  it? 

Our  guide  in  reconstruction  is  that  a  good  architecture  exhibits  conceptual  integrity  [Brooks 
75]:  it  is  built  from  a  small  number  of  components  that  are  connected  in  regular  ways.  But  this 
is  only  half  of  the  battle.  There  should  be  a  consistent  allocation  of  functionality  to  the  archi¬ 
tecture’s  components  and  connectors  [Kazman  94].  Thus,  any  four  boxes  are  probably  not  a 
fifoocf  architecture  because  the  components  of  those  boxes  have  little  to  do  with  each  other  in 
terms  of  the  system’s  functional  allocation. 

We  have  used  this  rule  of  thumb  as  our  guide  in  deciding  when  to  group  components  together 
in  the  examples  in  Section  3.  Consider  the  application-independent  patterns  (presented  in 
Section  3.3).  These  patterns  group  together  components  that  are  functionally  coherent,  irre¬ 
spective  of  the  application.  Grouping  a  class’s  member  functions  and  variables  together  Is  one 
example  of  this. 

The  common  application  patterns  for  VANISH  show  an  application-dependent  version  of  this 
principle;  in  Section  3.4  we  showed  how  VANISH’S  Presentation  layer  consists  of  all  and 
only  those  classes  that  make  calls  to  functions  in  Mesa,  XForms,  Xlib,  etc.  The  Presenta¬ 
tion  layer  is  functionally  coherent,  which  is  to  say  that  the  functional  decomposition  provided 
by  the  Arch  metamodel  is  respected  by  the  structure  of  VANISH;  It  consists  of  all  and  only  pre¬ 
sentation  functionality. 

By  way  of  contrast,  the  structure  of  UCMEdit  does  nof  show  a  consistent  allocation  of  function¬ 
ality  onto  structure,  which  is  why,  even  though  we  can  build  a  relatively  simple  architecture  for 
UCMEdit,  this  architecture  does  us  no  good.  We,  as  analysts,  get  little  Insight  into  the  system 
by  looking  at  this  simple  structure,  because  the  components  and  their  interconnections  are  not 
functionally  consistent. 


CMU/SEI-97-TR-010 


31 


CMU/SEI-97-TR-010 


5  Related  Work 

Dali  owes  much  to  Murphy’s  work  on  LSME  [Murphy  96b]  and  RMTool  [Murphy  95].  LSME  pro¬ 
vides  a  substantial  amount  of  the  source  model  extraction  to  populate  Dali’s  database.  RMTool 
has  the  same  goals  as  Dali,  but  with  a  narrower  scope,  extracting  only  from  source  artifacts 
and  lacking  the  tightly  coupled  user  interaction.  More  importantly,  RMTool  provides  a  language 
of  limited  expressive  power  for  construction  of  architectural  components  from  source  ele¬ 
ments.  As  mentioned  earlier,  RMTool  has  been  incorporated  as  one  of  the  analysis  options  in 

Dali. 

Another  system  that  is  closely  related  to  Dali  is  ManSART  [Yeh  97],  a  tool  for  recovering  ar¬ 
chitectural  information  and  manipulating  views  of  that  information.  Dali  and  ManSART  are  sim¬ 
ilar  in  the  following  ways: 

•  Both  systems  have,  as  their  goals,  the  reconstruction  of  architectural  information  from 
lower-level  sources. 

•  A  view,  in  ManSART  terms,  is  generated  by  an  operator  operating  over  the  results  of 
some  recognizer.  This  is  similar  to  our  notion  of  a  view  being  generated  by  one  or  more 
patterns  (SQL  queries)  operating  over  a  table  populated  by  some  extraction  tool. 

•  Both  systems  visualize  the  resulting  information  and  provide  some  means  for  an  architect 
to  interact  with  extracted  and  derived  information. 

Dali  and  ManSART  are  different  in  some  important  ways  as  well: 

•  ManSART  recognizers  depend  exclusively  on  parsing  and  the  production  of  an  Abstract 
Syntax  Tree  (via  REFINE/C  [Reasoning  97]).  Therefore  it  is  harder  for  ManSART  to  switch 
languages/dialects  because  it  has  to  integrate  an  appropriate  parser  into  the  system  and 
it  requires  compilable  code. 

•  ManSART  concentrates  on  the  code  as  the  artifact  of  interest;  it  does  not  appear  to  extract 
non-language  artifacts  (such  as  makefiles  or  profiling  information). 

•  ManSART  lacks  the  notion  of  combining  the  same  kind  of  information  from  multiple 
sources  in  a  fault  tolerant  manner. 

•  Our  architecture  is  generic:  It  does  not  assume  the  existence  of  particular  extraction, 
visualization,  or  analysis  components.  We  store  information  in  a  standard  relational 
database,  and  we  can  thus  accept  input  from  any  source,  as  long  as  It  creates  a  table  in 
the  database.  We  could,  in  principle,  integrate  ManSART  as  a  part  of  a  Dali  workbench. 

•  ManSART’s  approach  is  more  “heavyweight”  in  terms  of  computation  and  infrastructure. 
Recognizers  are  “complex  combinations  of  feature  extractors  and  view  manipulators” 
[Yeh  97].  Dali,  on  the  other  hand,  extracts  everything  and  lets  the  user  iteratively  and 
incrementally  build  up  a  view  through  opportunistic  pattern  recognition  and  definition.  In 
this  way,  our  approach  is  more  “lightweight.” 

Dali  is  also  related  to  the  Desire  system,  developed  at  MCC  [Biggerstaff  89].  Desire  was  de¬ 
veloped  with  the  goal  of  recovering  detailed  design  information  from  implemented  systems  to 
support  maintenance  and  the  harvesting  of  reusable  components.  To  represent  a  system.  De¬ 
sire  constructs  a  dictionary  of  system  constructs  and  applies  Prolog  queries  to  probe  the  dic¬ 
tionary.  This  is  analogous  to  Dali’s  use  of  a  repository.  Both  Desire  and  Dali  apply  pattern 


CMU/SEI-97-TR-010 


33 


matching  to  identify  constructs  in  a  model  of  the  system  under  study.  The  central  element  of 
Desire  is  a  domain  model  that  stores  conceptual  abstractions  to  be  matched.  These  concep¬ 
tual  abstractions  contain  idioms  made  up  of  linguistic  and  structural  patterns.  Conceptually, 
both  Dali  and  Desire  recognize  the  importance  of  an  expert.  Desire  relies  on  the  knowledge  of 
an  expert  to  identify  the  conceptual  abstractions  that  populate  the  domain  model. 

However,  there  are  some  differences  between  Desire  and  Dali.  While  Desire  is  focused  on  the 
recovery  of  detailed  design  information,  Dali’s  goal  is  the  construction  of  high-level  architec¬ 
tural  representations  that  can  be  used  to  evaluate  properties  of  the  system  has  a  whole.  Desire 
has  an  additional  goal  of  harvesting  reusable  components.  In  contrast,  Dali  identifies  architec¬ 
tural  constructs.  Also,  Desire  is  not  intended  to  be  open  or  lightweight.  It  depends  on  its  out 
of  the  box”  functionality  to  meet  its  goals.  As  with  ManSART,  a  Dali  workbench  could  incorpo¬ 
rate  Desire  as  appropriate  to  the  analysis  task  at  hand. 
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6  Post  Mortem 


This  papsr  has  presanted  the  Dali  workbench  for  architectural  extraction  and  reconstruction. 
The  tool  was  born  out  of  our  need  to  have  something  to  examine  when  we  do  architectural 
analyses.  Frequently  we  are  asked  to  analyze  a  system’s  software  architecture  and  are  given 
only  its  code  and  the  limited  time  of  a  designer. 

This  tool  goes  a  long  way  in  helping  an  analyst  reconstruct  a  sottware  architecture  from  vari¬ 
ous  artifacts  such  as  source  code,  makefiles,  and  profiling  information.  Dali  allows  an  analyst 
to  interact  with  the  recovered  information  by  assessing  the  results  of  the  reconstruction  effort 
to  see  whether  composite  elements  demonstrate  functional  consistency,  and  by  seeing  places 
where  the  as-built  architecture  differs  from  the  as-designed  architecture. 

We  have  not  attempted  a  “big-bang”  solution  here.  One  of  our  emphases  has  been  to  provide 
an  open,  lightweight  environment  so  that  tools  can  be  integrated  opportunistically.  We  believe 
that  no  single  tool  is  right  for  all  jobs.  Certainly,  extraction  demands  different  tools  for  different 
languages  and  styles  of  systems.  Our  other  emphasis  is  that  no  extraction  technique  is  useful 
or  complete  without  user  interaction.  In  some  respects,  software  architecture  is  a  mass  hallu¬ 
cination,  but  a  convenient  and  useful  one  that  is  created  by  and  for  humans.  So  a  human  must 
be  part  of  the  recovery  process,  interpreting  evidence  and  creating  and  testing  theories. 
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7  The  Next  Case:  Where  Do  We  Go  From  Here? 

Dali  is  still  an  experimental  system.  In  the  near  term,  there  are  three  directions  in  which  we 
want  to  extend  this  work:  extend  its  scope,  improve  its  extraction  capabilities,  and  improve  its 
user  interaction.  We  will  discuss  each  of  these  directions  briefly. 

We  have  only  applied  Dali  to  half  a  dozen  systems,  all  of  which  used  C  or  C++  dialects.  Our 
intention  is  to  extend  Dali’s  scope  by  applying  it  to  the  extraction  and  analysis  of  larger  sys¬ 
tems  and  other  languages  (particularly  legacy  COBOL  and  Fortran  systems).  This  means  aug¬ 
menting  Dali’s  extraction  capabilities  with  tools  and  techniques  appropriate  to  other 
languages.  Similarly,  we  want  to  augment  Dali’s  analysis  capabilities  by  integrating  other  tools. 
One  avenue  is  to  export  and  import  an  ACME  [Garlan  97]  representation  of  the  architecture 
from  Dali.  We  view  these  enhancements  of  Dali  as  relatively  rote  and  easily  achieved. 

An  area  of  more  speculative  research  is  the  improvement  of  Dali’s  capabilities  for  a  combina¬ 
tion  of  extracted  information.  Extraction  is  an  error-prone  process  [Murphy  96a].  We  realize 
that  no  extraction  tool  is  going  to  reliably  retrieve  everything  of  interest  about  an  architecture. 
So,  when  dealing  with  noisy,  error-prone  input,  a  sensible  thing  to  do  is  to  not  rely  on  any  single 
source  of  information.  We  must  do  multiple  extractions  and  combine  them  intelligently.  The 
meaning  of  “intelligently”  is  the  research  area  here.  There  are  several  possibilities:  After  char¬ 
acterizing  the  trustworthiness  of  individual  techniques  in  extracting  particular  source  elements, 
the  techniques  would  be  used  to  populate  only  those  database  relations  for  which  they  are 
“trusted.”  Additionally  we  can  take  a  fault-tolerance  view  of  the  problem  and  have  each  of  the 
extraction  techniques  vote  on  a  particular  source  element.  We  would  only  accept  the  informa¬ 
tion  into  the  database  when  a  majority  of  the  relevant  extraction  techniques  agree.  Or,  we 
could  use  a  hybrid  of  these  two  techniques,  where  the  vote  of  each  extraction  technique  is 
weighted  according  to  the  trustworthiness  of  that  technique. 

Another  way  of  making  Dali  more  useful  is  to  improve  its  interaction  with  the  user.  This,  once 
again,  has  a  near-term  and  a  long-term  aspect.  In  the  near  term,  the  addition  of  a  history  mech¬ 
anism  with  the  option  of  playback  will  make  a  user  braver  in  taking  exploratory  excursions, 
knowing  that  they  can  always  be  undone.  Providing  user-guided  pattern  inferencing  (where  a 
user  modifies  the  architecture  through  direct  manipulation  and  the  system  infers  the  architec¬ 
tural  rules  from  this  interaction)  is  our  long-term  user  interaction  goal. 

Finally,  we  can  envision  using  Dali  as  a  tool  for  guiding  architectural  evolution;  for  example,  in 
determining  how  difficult  it  would  be  to  change  an  architecture’s  connection  mechanisms.  This 
might  be  used  to  make  a  legacy  system  ‘Web-enabled,”  or  distribute  it  using  CORBA. 
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